High-fidelity reliability simulation of XOR-based erasure codes

نویسندگان

  • Kevin Greenan
  • Jay J. Wylie
چکیده

Erasure codes are the means by which storage systems are typically made reliable. Recent high profile studies of disk failure and sector failures indicate that ever more fault tolerant erasure codes are needed. Many traditional RAID approaches, parity-check array codes (e.g.,EVENODD, RDP, and X-code), and MDS codes offer two and three disk fault tolerant schemes. There are also many novel erasure code proposals that provide similar fault tolerance, such as SPC codes, Weaver codes, and low-density parity-check (LDPC) codes. Such erasure codes offer different spaceefficiency and performance tradeoffs than traditional erasure codes. Unfortunately, such erasure codes are also distinguished by their irregular fault tolerance: different sets of disk and sector failures of similar sizes may or may not lead to data loss. Some RAID schemes provide irregular fault tolerance. For example, replicated stripes (RAID 10) has pairs of disk failures that lead to data loss, and others that do not. Reasoning about irregular fault tolerance and reliability is quite challenging. Hafner and Rao have developed Markov models of the reliability of some irregular erasure codes [3]. Elerath and Pecht recently concluded that simulation of a single-disk fault tolerant storage system utilizing Weibull-distributed failure rates and latent sector failures, leads to radically different MTTDL results than Markov models [1]. We go beyond the conclusions of Elerath and Pecht: we believe that Markov models cannot effectively model irregular fault tolerance because of the complexities of correctly modeling disk rebuild, and because latent sector failures and scrubbing should be included in the model [2]. We have developed the High-Fidelity Reliability (HFR) Simulator. The HFR Simulator permits the reliability evaluation of both irregular and traditional erasure codes under a single framework. To achieve high-fidelity simulation we leverage our prior work on minimal erasures [4]. Minimal erasures concisely and precisely describe the fault tolerance of an irregular erasure code. We used the HFR Simulator to perform the most comprehensive “apples to apples” comparison of the reliability of different erasure codes of which we are aware. We evaluated over ten different erasure code constructions in the same framework. In the comparison, we evaluate maximum distance separable (MDS) codes (i.e., ReedSolomon, RAID 4, RAID 6), parity-check array codes (i.e., EVENODD, RDP, X-code, SPC, and Weaver codes), and low-density parity-check (LDPC) codes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Security And Reliability Using Linear Erasure Correction Codes

This study introduces a novel scientific foundation to manage security and reliability in resource scarce environments such as wireless edge networks. We present encryption techniques based on erasure correction codes. The scheme provides adaptive security and reliability depending on the resource availability, channel error conditions, threat levels, and the application requirements. To increa...

متن کامل

On the Speedup of Recovery in Large - Scale Erasure - Coded Storage Systems ( Supplementary File )

Our work focuses on the recovery solutions for XORbased erasure codes. We point out that regenerating codes [5] have recently been proposed to minimize the recovery bandwidth in distributed storage systems. The idea is that surviving storage nodes compute and transmit linear combinations of their stored data during failure recovery. On the other hand, in XOR-based erasure codes, we do not requi...

متن کامل

Asymptotically MDS Array BP-XOR Codes

Belief propagation or message passing on binary erasure channels (BEC) is a low complexity decoding algorithm that allows the recovery of message symbols based on bipartite graph prunning process. Recently, array XOR codes have attracted attention for storage systems due to their burst error recovery performance and easy arithmetic based on Exclusive OR (XOR)-only logic operations. Array BP-XOR...

متن کامل

Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads

To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasur...

متن کامل

Belief Propagation Decodable XOR based Erasure Codes For Distributed Storage Systems

LDPC codes and digital fountain techniques have received significant attention from both academics and industry in the past few years. There have also been extensive interests in applying LDPC code techniques to distributed storage systems such as cloud data storage in recent years. This paper carries out the theoretical analysis on the feasibility and performance issues for applying LT codes t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008